363 research outputs found

    Sublinear algorithms for local graph centrality estimation

    Get PDF
    We study the complexity of local graph centrality estimation, with the goal of approximating the centrality score of a given target node while exploring only a sublinear number of nodes/arcs of the graph and performing a sublinear number of elementary operations. We develop a technique, that we apply to the PageRank and Heat Kernel centralities, for building a low-variance score estimator through a local exploration of the graph. We obtain an algorithm that, given any node in any graph of mm arcs, with probability (1−ή)(1-\delta) computes a multiplicative (1±ϔ)(1\pm\epsilon)-approximation of its score by examining only O~(min⁥(m2/3Δ1/3d−2/3, m4/5d−3/5))\tilde{O}(\min(m^{2/3} \Delta^{1/3} d^{-2/3},\, m^{4/5} d^{-3/5})) nodes/arcs, where Δ\Delta and dd are respectively the maximum and average outdegree of the graph (omitting for readability poly⁥(ϔ−1)\operatorname{poly}(\epsilon^{-1}) and polylog⁥(ή−1)\operatorname{polylog}(\delta^{-1}) factors). A similar bound holds for computational complexity. We also prove a lower bound of Ω(min⁥(m1/2Δ1/2d−1/2, m2/3d−1/3))\Omega(\min(m^{1/2} \Delta^{1/2} d^{-1/2}, \, m^{2/3} d^{-1/3})) for both query complexity and computational complexity. Moreover, our technique yields a O~(n2/3)\tilde{O}(n^{2/3}) query complexity algorithm for the graph access model of [Brautbar et al., 2010], widely used in social network mining; we show this algorithm is optimal up to a sublogarithmic factor. These are the first algorithms yielding worst-case sublinear bounds for general directed graphs and any choice of the target node.Comment: 29 pages, 1 figur

    Faster Subgraph Counting in Sparse Graphs

    Get PDF
    A fundamental graph problem asks to compute the number of induced copies of a k-node pattern graph H in an n-node graph G. The fastest algorithm to date is still the 35-years-old algorithm by Nesetril and Poljak [Nesetril and Poljak, 1985], with running time f(k) * O(n^{omega floor[k/3] + 2}) where omega <=2.373 is the matrix multiplication exponent. In this work we show that, if one takes into account the degeneracy d of G, then the picture becomes substantially richer and leads to faster algorithms when G is sufficiently sparse. More precisely, after introducing a novel notion of graph width, the DAG-treewidth, we prove what follows. If H has DAG-treewidth tau(H) and G has degeneracy d, then the induced copies of H in G can be counted in time f(d,k) * O~(n^{tau(H)}); and, under the Exponential Time Hypothesis, no algorithm can solve the problem in time f(d,k) * n^{o(tau(H)/ln tau(H))} for all H. This result characterises the complexity of counting subgraphs in a d-degenerate graph. Developing bounds on tau(H), then, we obtain natural generalisations of classic results and faster algorithms for sparse graphs. For example, when d=O(poly log(n)) we can count the induced copies of any H in time f(k) * O~(n^{floor[k/4] + 2}), beating the Nesetril-Poljak algorithm by essentially a cubic factor in n

    Correlation Clustering with Adaptive Similarity Queries

    Get PDF
    In correlation clustering, we are given nn objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound. These results give a rich characterization of the trade-off between queries and clustering error

    Motif counting beyond five nodes

    Get PDF
    Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets

    The Limits of Popularity-Based Recommendations, and the Role of Social Ties

    Get PDF
    In this paper we introduce a mathematical model that captures some of the salient features of recommender systems that are based on popularity and that try to exploit social ties among the users. We show that, under very general conditions, the market always converges to a steady state, for which we are able to give an explicit form. Thanks to this we can tell rather precisely how much a market is altered by a recommendation system, and determine the power of users to influence others. Our theoretical results are complemented by experiments with real world social networks showing that social graphs prevent large market distortions in spite of the presence of highly influential users.Comment: 10 pages, 9 figures, KDD 201
    • 

    corecore